CU Amiga Super CD-ROM 2

home *** CD-ROM | disk | FTP | other *** search

/ CU Amiga Super CD-ROM 2 / CU Amiga Magazine's Super CD-ROM 02 (1996)(EMAP Images)(GB)[!][issue 1996-04].iso / magazine / amiga_e / modulesmc1 / estuff.doc next >

Wrap

Text File | 1994-11-21 | 13KB | 283 lines

These comments and ideas are based on the premise that faster is better. It's reasonable to assume that if well thought out assembly language is used to replace parts of a higher level language, then the program will run faster. It will also usually be smaller, which is no bad thing. E language makes incorporating assembly language extremely easy. For max speed, the first thing is to have a good algorithm, i.e. write the program in E to make it run as fast as possible, before even thinking about assembly language. The best assembly language program may well be a dog if the algorithm is lousy to start with. Use registers as much as possible. The simple minded approach is to just put OPT REG=5 at the top of the program. This gives EC the option to use its own judgment about which registers to use for what. Without this option variables get stored at offsets from registers A4 or A5, which means that access to the variables is slower than if the variables were stored in one of the data registers, of which EC uses D3 through D7. Time your program both with and without OPT REG=5, and note the difference in speed and size. Try it again with REG=4 or REG=3, which in some cases may be slightly faster. If there are more variables than you specify by the REG options, EC does a good job of deciding which variables should go into registers, but it may be that you can make a more educated choice of variables than EC can. If you know that a certain variable should be in a register, than use DEF var:REG to force it there. You can do this with no more than 5 variables, of course. It doesn't make much sense to translate an entire program into assembly language if most of the running time of the program is spent churning around and around in one small part of the program. If you aren't sure what part of your program takes up most of the time, profile it. The E distribution provides a great profiler, AProf, in the Bin directory. Documentation for AProf is in Tools/AProf. To use it, the executable file to be profiled should have a symbol table, which you get by using the EC option "sym". To make it easy, use the alias capability of AmigaDOS. Type in "alias ecs ec sym", or better yet put it in your startup file. Once you know what part of the program you are going to concentrate on, the easiest way to get started is to see how that part of the program is already coded by EC, and try to figure out more economical ways to code it. To do this, you need to disassemble an executable file. There are many debuggers and disassemblers in the public domain. My first recommendation is ADIS, by Martin Apel. email: apel@physik.uni-kl.de I know that it is in the public domain, but have completely forgotten where I got it. It's well worth looking for. In many cases it will produce files which are essentially ready for reassembly. To use it with E the -ml option is necessary. Again an alias is appropriate, "alias adis adis -ml" I use "alias adis adis -ml -c2" because I have an Amiga 1200 which uses the 68020 chip. Actually the -c2 option is necessary only when you are disassembling programs which have instructions contained in the 68020 but not the 68000 chip. Not many programmers use them, which makes sense to ensure backward compatibility. The only ones I have ever used are some of the 32 bit divide and multiply instructions. Disassembling isn't much use if you then can't find the code you are planning to improve. No problem: Surround the code with NOPs. NOP is 68000 for do nothing, and it can go anywhere in the program. Put the disassembled program into your text editor and look for NOP (or possibly nop if that's what your disassembler puts out). I use Matt Dillon's DME, which can find a NOP almost instantaneously. Example: PROC main() DEF i:REG NOP i++ NOP ENDPROC results in a bunch of stuff including: main LINK A5,#$0 MOVEM.L D7,-(SP) NOP ADDQ.L #$1,D7 NOP MOVEQ #$0,D0 MOVEM.L (SP)+,D7 UNLK A5 RTS The listing above is exactly as it came out of ADIS. Note: Most disassemblers put out SP instead of A7, but EC insists on A7. The listing shows that variable i is stored in D7, and that ADDQ.L #$1,D7 is equivalent to i++. Now try it again without the :REG. main LINK A5,#-$4 NOP ADDQ.L #$1,-$4(A5) NOP MOVEQ #$0,D0 UNLK A5 RTS Now i is stored at a memory location 4 below whatever address A5 is pointing at, and ADDQ.L #$1,-4(A5) is considerably slower than ADDQ.L #$1,D7. Note that the LINK instrucion makes available 4 bytes on the stack to stow i in. UNLK releases that stack space. Here's a more complicated example: PROC main() DEF i, j=0 NOP FOR i:=1 TO 100 DO j++ NOP ENDPROC Which translates into: main LINK A5,#-$8 MOVEQ #$0,D0 MOVE.L D0,-$8(A5) NOP MOVEQ #$1,D0 MOVE.L D0,-$4(A5) L$1B4 MOVEQ #$64,D0 CMP.L -$4(A5),D0 BMI.L L$1CA ADDQ.L #$1,-$8(A5) ADDQ.L #$1,-$4(A5) BRA.L L$1B4 L$1CA NOP MOVEQ #$0,D0 UNLK A5 RTS If DEF i,j=0 is changed to DEF i:REG, j=0:REG the result is: main LINK A5,#$0 MOVEM.L D6-D7,-(SP) MOVEQ #$0,D0 MOVE.L D0,D6 NOP MOVEQ #$1,D0 /* set up loop */ MOVE.L D0,D7 L$1B4 MOVEQ #$64,D0 CMP.L D7,D0 BMI.L L$1C4 /* get out of loop */ ADDQ.L #$1,D6 /* this is j */ ADDQ.L #$1,D7 /* increment loop counter */ BRA.L L$1B4 /* go to top of loop */ L$1C4 NOP MOVEQ #$0,D0 MOVEM.L (SP)+,D6-D7 UNLK A5 RTS This doesn't look much simpler than the previous one but runs a lot faster. So far we haven't written any assembly language. Let's improve on the line FOR i:=1 TO 100 DO j++ and assume that DEF i:REG, j:REG is in the program. MOVEQ #99,i /* these 3 instructions replace 8 above */ loop: ADDQ.L #1,j DBRA i,loop That's all there is to it. First 99 gets moved into some data register i. Let EC worry about which one it is. It's correct to put in 99 rather than 100 because DBRA always takes one more trip through the loop than the starting value of i. The loop is traversed 100 times, each time adding 1 to j, again letting EC worry about what data register holds j. For the real speed fanatics, sometimes it is practical to put the code for a function in line, even if it is code provided by a module or even by EC itself. The StrLen function is a good example. Note that this is StrLen, not EstrLen, and operates on any null-terminated string. The advantage of putting functions in line is that the overhead of pushing variables, branching, and returning takes a lot of time, and that time can be saved. MOVEQ #-1,D0 MOVEA.L str,A0 loop: TST.B (A0)+ DBEQ D0,loop NOT.L D0 Here str is the name of the string whose length you want. D0 will hold that length when the instructions are completed. Note that DBEQ exits the loop when the zero condition flag is set, at which time D0 is some negative number. It would seem reasonable to use NEG on D0, since NEG does a two's complement negation, but NOT gives the correct answer. Putting this code in line does not increase the size of the program. Another extremely small function is strcopy, assuming here that str and newstr are either Estrings or strings, i.e ARRAY OF CHAR. MOVEA.L str,A0 MOVEA.L newstr,A1 loop: MOVE.B (A0)+,(A1)+ BNE.S loop For the ultimate in speed, where you want a very small string constant to be shoved into a string, try some variation on this: MOVEA.L str,A0 MOVE.L #"abc\0",(A0) It is not my intention to attempt to teach assembly language, just to get someone who knows a little assembly language started on incorporating it into E. Beginners will find it easier to become proficient this way than by trying to write a complete assembly lanuage from scratch. The two books which I have found most useful in getting started with assembly language are Programming the 68000 by Steve Williams, (SYBEX) and Amiga Machine Language by Stefan Dittrich (Abacus). A probably unnecessary word of caution: If you program in assembly language, the Guru is coming! You might minimize its impact a bit by working in RAD: or some other variety of supposedly recoverable ram: device. In particular, be wary of any program which opens and sends something to a hard disk file. It's annoying to have to reconstruct a few megabytes of hard disk files. Beware the incredibly slow RawDoFmt!!! Back in the early days somewhere around 1986 when the Amiga 1000 was just called the Amiga, I wrote a disassembler, in BASIC of all things. Later I translated it into C, using Matt Dillon's DICE, then into Pascal, using Pat Quaid's PCQ Pascal, and finally into E. A testfile of about 20K was processed in about 19 seconds with DICE, in about 15 seconds with PCQ, and finally in about 44 seconds with E. Not good!! After quite a bit of detective work the problem was traced to the above mentioned RawDoFmt, which hangs out in the exec library. E uses it in WriteF and StringF, PCQ and DICE don't! I had put together a stringf function for PCQ, using mostly my code with a bit of help from an itoa (integer to string) function in the public domain somewhere. I later passed it on to Joe Siebenmann for his EZAsm. Most recently I rewrote it as a module for E. The moral of the story is that now my disassembler processes the same test file in 11 seconds, a clear winner over PCQ which uses essentially the same stringf, and DICE, which doesn't. The speed increase is attributable entirely to the stringf function, which is at least 15 times as fast as StringF. Incidentally, I later translated the disassembler into EZAsm, and the same file ran in 3 seconds, but that's a different story. If you have need for a fast stringf function, here they are, all 4 of them. Four because there are two versions, and both come in two options. One option is for the 68000 and one for the 68020 or above. My Amiga 1200 has the 68020 chip so I figured that I might as well take advantage of the DIVU.L inst. It doesn't have any noticeable affect on speed, but makes the module about 80 bytes smaller, for whatever that's worth. Now for the two versions: qstringf.m and of course qstringf20.m run about twice as fast as stringf.m and stringf20.m, but don't do as much. The q versions handle format strings including \n, \c, \d, \h, \s. The versions without q also handle \l, \r, \z, [n] and (m,n). In other words, for the loss of speed you get left and right justification, padding with leading zeros, field specifications and max and min lengths for strings. For most purposes, the quick version will get the job done faster with smaller code. Don't try to include more than one version as a module in a program, since both modules will have similarly named functions. In other words, all four have a procedure called stringf in them, but each is different. As with the real StringF, the output string will be an Estring and the return values are the same as for StringF. How to use stringf: Just like StringF EXCEPT, the data stream must be a list!! Example: StringF(str,format,datastream) becomes stringf(str,format,[datastream]) StringF(str,'\d \h \s\n',3,5,'abc') becomes stringf(str,'\d \h \s\n',[3,5,'abc']) If you forget the square brackets , it will give you an error message if you put more than one argument in the data stream. With just one argument in the data stream, stringf will assume that argument is a pointer to a list, with unpredictable results (potential guru?). Incidentally, all versions have an added capability, printout of binary representations, called by putting %lb in the format string. If you get your kicks by translating C programs into E, you may not be aware that you can use the format strings unchanged in most cases. For instance, EC would translate \h into %lx before passing it to RawDoFmt, so why not just leave it %lx in the first place? To see what EC does to various \ options, use the technique of creating an executable file and disassembling. Don't bother with the NOP business because the strings will be somewhere else, probably near the end of the file. StringF(s,'xxx\h[2]xxx\d[2]xxx\n',456,456) WriteF('string is \s',s) According to my C manual, if the field specification doesn't provide enough space to print an entire number, the whole number will be printed anyway. RawDoFmt won't do that. My stringf will. Try the above, which should print three digits for each number but only prints two. The stringf20.e file is an example of an E file which has been almost fully translated from E to assembly language. Stringf20.doc is really an early version of stringf20.e, when the translation process was just getting started. By comparing the two files, you can find examples of many E structures, such as REPEAT:UNTIL loops, FOR loops, SELECT:CASE, and various IF statements.